From this report we can notice:

* We don't have missing values in our dataset
<br>
* The values of energy consumed seems correct and it is distrubuted in a range of [6400 8500] (in both the training and testing set) .
<br>
* The energy and enthalpy variables are highly correlated espacially in the training dataset.
The correlation between these two variable in train set = 0.480433 and in the test set = -0.27703 => the correlation in test set is lower
<br>
* The training set contains only one class (good energy consumption)
<br>
* The testing set contains two classes:
<br>
    good energy consumption
    <br>
    excessive energy consumption
    <br>
=> So we can adress this problem as an anomaly detection problem

data visualization

In this case, it is not possible to identify the outlier directly from investigation one variable at the time
It is the combinaision of the energy and enthalpy variables that allows us to easily identify the anomaly

Build model

It is become quite easy to visually identify abnormal consumption through data points located outside the typical distribution
We have only two variables and we can clearly visualize the relation between them, so we can use linear regression to fit data, then we calculate the residual value.
By examinate the distribution of the residual values, we can chose a cutoff value or threshold.
Every sample with residual value exceds the cutoff value is considered as excessive consumption
For this problem, linear regression model can reach very goods results and we don't have to use more complicated model (Auto-encoder for example)

linear regression

From this fig, we can notice that the residual values follow a gaussian distribution, so we can use the standard deviation to define the cutoff value.
1 Standard Deviation from the Mean: 68%
2 Standard Deviations from the Mean: 95%
3 Standard Deviations from the Mean: 99.7%

After some visualisations using differet value of cutoff, we notice that the value 1.7*std gives better results
The energy isn't only related to the enthalpy, it is also affected by the nunber of customers and how much time they spend, also it can depends on the sunset time which is different between summer and winter....
That's explain the presence of some outliers in the training set

By chosing more than one catoff value, we can define several interval of consumption (low, optimal, litle bitexcessive, to excessive)